AITopics | longformer model

Collaborating Authors

longformer model

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Beyond Token Limits: Assessing Language Model Performance on Long Text Classification

Sebők, Miklós, Kovács, Viktor, Bánóczy, Martin, Eriksen, Daniel Møller, Neptune, Nathalie, Roussille, Philippe

arXiv.org Artificial IntelligenceSep-30-2025

The most widely used large language models in the social sciences (such as BERT, and its derivatives, e.g. RoBERTa) have a limitation on the input text length that they can process to produce predictions. This is a particularly pressing issue for some classification tasks, where the aim is to handle long input texts. One such area deals with laws and draft laws (bills), which can have a length of multiple hundred pages and, therefore, are not particularly amenable for processing with models that can only handle e.g. 512 tokens. In this paper, we show results from experiments covering 5 languages with XLM-RoBERTa, Longformer, GPT-3.5, GPT-4 models for the multiclass classification task of the Comparative Agendas Project, which has a codebook of 21 policy topic labels from education to health care. Results show no particular advantage for the Longformer model, pre-trained specifically for the purposes of handling long inputs. The comparison between the GPT variants and the best-performing open model yielded an edge for the latter. An analysis of class-level factors points to the importance of support and substance overlaps between specific categories when it comes to performance on long text inputs.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2509.10199

Country:

Europe (1.00)
North America > United States (0.46)

Genre: Research Report > New Finding (1.00)

Industry:

Law (1.00)
Government > Regional Government (0.93)
Law Enforcement & Public Safety (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Adaptation of Biomedical and Clinical Pretrained Models to French Long Documents: A Comparative Study

Bazoge, Adrien, Morin, Emmanuel, Daille, Beatrice, Gourraud, Pierre-Antoine

arXiv.org Artificial IntelligenceFeb-26-2024

Recently, pretrained language models based on BERT have been introduced for the French biomedical domain. Although these models have achieved state-of-the-art results on biomedical and clinical NLP tasks, they are constrained by a limited input sequence length of 512 tokens, which poses challenges when applied to clinical notes. In this paper, we present a comparative study of three adaptation strategies for long-sequence models, leveraging the Longformer architecture. We conducted evaluations of these models on 16 downstream tasks spanning both biomedical and clinical domains. Our findings reveal that further pre-training an English clinical model with French biomedical texts can outperform both converting a French biomedical BERT to the Longformer architecture and pre-training a French biomedical Longformer from scratch. The results underscore that long-sequence French biomedical models improve performance across most downstream tasks regardless of sequence length, but BERT based models remain the most efficient for named entity recognition tasks.

architecture, classification, dataset, (15 more...)

arXiv.org Artificial Intelligence

2402.16689

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > France > Pays de la Loire > Loire-Atlantique > Nantes (0.05)
North America > Canada > Ontario > Toronto (0.05)
(8 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.88)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Enhanced Labeling Technique for Reddit Text and Fine-Tuned Longformer Models for Classifying Depression Severity in English and Luganda

Kimera, Richard, Rim, Daniela N., Kirabira, Joseph, Udomah, Ubong Godwin, Choi, Heeyoul

arXiv.org Artificial IntelligenceJan-25-2024

Depression is a global burden and one of the most challenging mental health conditions to control. Experts can detect its severity early using the Beck Depression Inventory (BDI) questionnaire, administer appropriate medication to patients, and impede its progression. Due to the fear of potential stigmatization, many patients turn to social media platforms like Reddit for advice and assistance at various stages of their journey. This research extracts text from Reddit to facilitate the diagnostic process. It employs a proposed labeling approach to categorize the text and subsequently fine-tunes the Longformer model. The model's performance is compared against baseline models, including Naive Bayes, Random Forest, Support Vector Machines, and Gradient Boosting. Our findings reveal that the Longformer model outperforms the baseline models in both English (48%) and Luganda (45%) languages on a custom-made dataset.

dataset, depression, longformer model, (9 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ICTC58733.2023.10393433

2401.1424

Country:

Asia > South Korea > Gyeongsangbuk-do > Pohang (0.05)
North America > United States > New York (0.04)
Europe (0.04)
(2 more...)

Genre: Research Report (0.84)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Health & Medicine > Consumer Health (1.00)
Media > News (0.94)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.54)

Add feedback

LLVMs4Protest: Harnessing the Power of Large Language and Vision Models for Deciphering Protests in the News

Zhang, Yongjun

arXiv.org Artificial IntelligenceNov-29-2023

Large language and vision models have transformed how social movements scholars identify protest and extract key protest attributes from multi-modal data such as texts, images, and videos. This article documents how we fine-tuned two large pretrained transformer models, including longformer and swin-transformer v2, to infer potential protests in news articles using textual and imagery data. First, the longformer model was fine-tuned using the Dynamic of Collective Action (DoCA) Corpus. We matched the New York Times articles with the DoCA database to obtain a training dataset for downstream tasks. Second, the swin-transformer v2 models was trained on UCLA-protest imagery data. UCLA-protest project contains labeled imagery data with information such as protest, violence, and sign. Both fine-tuned models will be available via \url{https://github.com/Joshzyj/llvms4protest}. We release this short technical report for social movement scholars who are interested in using LLVMs to infer protests in textual and imagery data.

language and vision model, news article, protest event, (13 more...)

arXiv.org Artificial Intelligence

2311.18241

Country:

North America > United States > New York > Suffolk County > Stony Brook (0.05)
North America > United States > California > Santa Clara County > Palo Alto (0.05)
Europe > Switzerland (0.05)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.98)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.98)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.31)

Add feedback

Chinese Fine-Grained Financial Sentiment Analysis with Large Language Models

Lan, Yinyu, Wu, Yanru, Xu, Wang, Feng, Weiqiang, Zhang, Youhao

arXiv.org Artificial IntelligenceSep-15-2023

Entity-level fine-grained sentiment analysis in the financial domain is a crucial subtask of sentiment analysis and currently faces numerous challenges. The primary challenge stems from the lack of high-quality and large-scale annotated corpora specifically designed for financial text sentiment analysis, which in turn limits the availability of data necessary for developing effective text processing techniques. Recent advancements in large language models (LLMs) have yielded remarkable performance in natural language processing tasks, primarily centered around language pattern matching. In this paper, we propose a novel and extensive Chinese fine-grained financial sentiment analysis dataset, FinChina SA, for enterprise early warning. We thoroughly evaluate and experiment with well-known existing open-source LLMs using our dataset. We firmly believe that our dataset will serve as a valuable resource to advance the exploration of real-world financial sentiment analysis tasks, which should be the focus of future research. The FinChina SA dataset is publicly available at https://github.com/YerayL/FinChina-SA

dataset, language model, sentiment analysis, (12 more...)

arXiv.org Artificial Intelligence

2306.14096

Country: Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report > New Finding (0.47)

Industry: Banking & Finance > Trading (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Next-Year Bankruptcy Prediction from Textual Data: Benchmark and Baselines

Arno, Henri, Mulier, Klaas, Baeck, Joke, Demeester, Thomas

arXiv.org Artificial IntelligenceAug-24-2022

Models for bankruptcy prediction are useful in several real-world scenarios, and multiple research contributions have been devoted to the task, based on structured (numerical) as well as unstructured (textual) data. However, the lack of a common benchmark dataset and evaluation strategy impedes the objective comparison between models. This paper introduces such a benchmark for the unstructured data scenario, based on novel and established datasets, in order to stimulate further research into the task. We describe and evaluate several classical and neural baseline models, and discuss benefits and flaws of different strategies. In particular, we find that a lightweight bag-of-words model based on static in-domain word representations obtains surprisingly good results, especially when taking textual data from several years into account. These results are critically assessed, and discussed in light of particular aspects of the data and the task. All code to replicate the data and experimental results will be released.

bankruptcy, bankruptcy prediction, prediction, (15 more...)

arXiv.org Artificial Intelligence

2208.11334

Country:

North America > United States (0.68)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:

Research Report (0.83)
Financial News (0.68)

Industry:

Banking & Finance (1.00)
Law > Business Law (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Code Comment Inconsistency Detection with BERT and Longformer

Steiner, Theo, Zhang, Rui

arXiv.org Artificial IntelligenceJul-28-2022

Comments, or natural language descriptions of source code, are standard practice among software developers. By communicating important aspects of the code such as functionality and usage, comments help with software project maintenance. However, when the code is modified without an accompanying correction to the comment, an inconsistency between the comment and code can arise, which opens up the possibility for developer confusion and bugs. In this paper, we propose two models based on BERT (Devlin et al., 2019) and Longformer (Beltagy et al., 2020) to detect such inconsistencies in a natural language inference (NLI) context. Through an evaluation on a previously established corpus of comment-method pairs both during and after code changes, we demonstrate that our models outperform multiple baselines and yield comparable results to the state-of-the-art models that exclude linguistic and lexical features. We further discuss ideas for future research in using pretrained language models for both inconsistency detection and automatic comment updating.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2207.14444

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > New York > New York County > New York City (0.05)
North America > Puerto Rico (0.04)
(2 more...)

Genre: Research Report (0.70)

Industry: Information Technology (0.48)

Technology:

Information Technology > Software Engineering (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Text Guide: Improving the quality of long text classification by a text selection method based on feature importance

Fiok, Krzysztof, Karwowski, Waldemar, Gutierrez, Edgar, Davahli, Mohammad Reza, Wilamowski, Maciej, Ahram, Tareq, Al-Juaid, Awad, Zurada, Jozef

arXiv.org Artificial IntelligenceApr-15-2021

The performance of text classification methods has improved greatly over the last decade for text instances of less than 512 tokens. This limit has been adopted by most state-of-the-research transformer models due to the high computational cost of analyzing longer text instances. To mitigate this problem and to improve classification for longer texts, researchers have sought to resolve the underlying causes of the computational cost and have proposed optimizations for the attention mechanism, which is the key element of every transformer model. In our study, we are not pursuing the ultimate goal of long text classification, i.e., the ability to analyze entire text instances at one time while preserving high performance at a reasonable computational cost. Instead, we propose a text truncation method called Text Guide, in which the original text length is reduced to a predefined limit in a manner that improves performance over naive and semi-naive approaches while preserving low computational costs. Text Guide benefits from the concept of feature importance, a notion from the explainable artificial intelligence domain. We demonstrate that Text Guide can be used to improve the performance of recent language models specifically designed for long text classification, such as Longformer. Moreover, we discovered that parameter optimization is the key to Text Guide performance and must be conducted before the method is deployed. Future experiments may reveal additional benefits provided by this new method.

experiment, text guide, transformer model, (15 more...)

arXiv.org Artificial Intelligence

2104.07225

Country:

North America > United States > Florida > Orange County > Orlando (0.28)
Europe > Poland > Masovia Province > Warsaw (0.04)
Asia > Middle East > Saudi Arabia (0.04)
(3 more...)

Genre: Research Report > New Finding (0.89)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback